Welcome to our workshop on building and compiling software on Hoffman2.
This knowledge can translate to building software on most other HPC resources.
This session is designed to teach you to install and manage most scientific software on Hoffman2, without the need for constant assistance from system admins.
We will explore:
This presentation and accompanying materials are available on 🔗 UCLA OARC GitHub Repository
You can view the slides in:
Each file provides detailed instructions and examples on the various topics covered in this workshop.
Note
Let’s understand the basic concepts involved in installing software from source code:
In most cases, scientific software comes with all these tools and files already set up.
Installing software on most HPC systems generally involves:
We will explore R/python/anaconda in another workshop.
Hoffman2 provides modulefiles for different versions of GCC and Intel Compilers.
These modules establish your Hoffman2 environment to use compilers, libraries, and other software pre-installed by Hoffman2 admins.
Here’s how you list available modules:
module load <module_name>/<version> to load the required compilers into your environment.The default GCC version on Hoffman2 (and other CentOS 7 systems) is 4.8.5.
This version may be too outdated for modern software, but Hoffman2 has other versions installed that can be accessed by loading modules
Here’s how to list all available GCC versions on Hoffman2:
Here’s how to use a specific GCC version:
💡 Intel’s sophisticated compilers for C, C++, and Fortran are optimized to leverage specific features of Intel processors, often resulting in faster and more efficient code.
OneAPI, Intel’s comprehensive suite, includes these compilers along with MPI, GPU, Math, and other libraries.
icc
icx for newer Intel versionsicpc
icpx for newer Intel versionsifortList all available Intel versions on Hoffman2
Use a certain GCC version
The choice of compiler depends on the specific requirements of your software and your project’s needs.
For general use, GCC is an excellent choice due to its wide support and availability.
For Intel-specific optimizations, or if your software specifically requires it, use Intel Compilers.
Other Compilers:
modules_lookup -m cudamodules_lookup -m hpcsdk💡 MPI (Message Passing Interface) is a standardized library specification for message-passing between different processes, typically in parallel computing environments.
It allows programs to run on multiple processors or compute nodes at once, communicating as needed, which can greatly speed up computation times.
If you code is setup to run in parallel with MPI, you can compile the software with MPI compilers and libraries
Intel MPI is Intel’s proprietary implementation of the MPI specification.
It is optimized to leverage the capabilities of Intel processors and networking technologies, offering high performance and scalability.
If you have the intel module loaded, it comes with Intel MPI.
Intel MPI commands with Intel compilers
Intel MPI commands with GCC compilers
Open MPI is a popular, open-source MPI implementation that is widely used in the HPC community. It supports a wide range of platforms and networking technologies. Use mpicc, mpic++, or mpifort to compile your programs with Open MPI.
List all available OpenMPI on Hoffman2
OpenMPI compilers:
The decision between Intel MPI and Open MPI is primarily influenced by the specific requirements of your application, including the hardware architecture, performance considerations, and code compatibility.
Intel MPI tends to deliver enhanced performance on Intel hardware, while Open MPI offers wide-ranging compatibility and is open-source, providing flexibility for customization.
Always refer to your software’s documentation to understand their recommendations and what they have tested on. While some codes might operate with any MPI implementation or version, others may have specific requirements.
Mathematical libraries, providing precompiled and performance-optimized mathematical routines, are key accelerators of scientific computations. Many scientific software packages rely on them for efficient numerical computation.
Be sure to check your software’s documentation for any specific library requirements.
Intel MKL is a leading math library for high-performance computing. It provides optimized versions of BLAS, LAPACK, and other libraries.
The intel module on Hoffman2 already includes these libraries.
Most software will automatically locate and link these Math libraries during the build process.
The source code of most research software is typically available on the internet. Refer to your software’s documentation to find out where you can download the code.
💻 Transferring Code from Your Local Machine
If the code is on your local computer, you can transfer it to Hoffman2 using the scp command:
⬇️ Downloading Code Directly to Hoffman2
Alternatively, you can download the code directly to Hoffman2 from the internet using wget or curl:
📦 Unpacking Compressed Files
If your code is compressed in a .tar.gz or .zip file, you can uncompress it using the tar or unzip command:
📦 Cloning from GitHub
If your code is hosted on a git repository (like GitHub), you can clone the repository:
We’ve now learned different methods to get our software code onto Hoffman2, whether it’s transferring from our local machine, downloading directly, or cloning from a git repository.
In the next steps, we’ll look at how to build and compile this code.
💡 make is a powerful tool for build automation, leveraging a Makefile to guide program compilation and linking processes.
The Makefile is a file that outlines rules defining dependencies between files and the commands to create or update them.
The common usage pattern is make <target>, where
Often, when you download source code, you’ll encounter a configure script in the top level directory of your software.
Part of the GNU build system or autotools, the configure script ensures the build environment is properly set up, dependencies are checked, and a suitable Makefile is created for your system.
This configure script is usually pre-included with software packages, though it can be generated using the autoconf tool.
Running ./configure typically initiates the following actions:
All you need to do is the run the configure script to start your install
Execute the configure script to initiate the installation process. You can customize the build by passing options to ./configure.
gfortrangccg++$HOME/apps/myappUse ./configure –help to see a list of options.
Upon the successful completion of ./configure:
make to initiate software compilation.
make install to relocate the software to the final directory.
--prefix.Quantum ESPRESSO is an open-source software package for electronic structure calculations using density functional theory. It simulates and analyzes atomic and molecular behavior at the quantum level, contributing to our understanding of various physical and chemical phenomena.
Below is a step-by-step guide to install Quantum ESPRESSO:
💡 You may often find the tool cmake is for build automation.
It uses CMakeLists.txt files to define how your program should be built.
Unlike make, cmake is platform independent and can generate native build scripts (e.g., makefiles on Unix and projects/workspaces in MSVC).
Normally, you will first need to to create a build directory for the compilation, then run cmake <path to source>, and finally make.
GROMACS (GROningen MAchine for Chemical Simulations) is a comprehensive package to execute molecular dynamics simulations, primarily of biochemical molecules like proteins and lipids. It is highly optimized for a range of hardware platforms and provides a plethora of calculation types, integration algorithms, and analysis tools.
We’ll utilize GROMACS to guide you through this build process.
💾 Download the code:
wget https://ftp.gromacs.org/gromacs/gromacs-2023.2.tar.gz
tar -vxf gromacs-2023.2.tar.gz
cd gromacs-2023.2📁 Create a build directory:
At this point, we’re currently located in the directory /../../gromacs-2023.2/build. The main source is still accessible at /../../gromacs-2023.2 or simply ..
🔧 Configure the build:
module load cmake
module load gcc/10.2.0
FC=gfortran CC=gcc CXX=g++ cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_INSTALL_PREFIX=$HOME/apps/gromcas/2023.2Here, -DGMX_BUILD_OWN_FFTW=ON and -DREGRESSIONTEST_DOWNLOAD=ON are GROMACS-specific options.
The -DCMAKE_INSTALL_PREFIX directive instructs cmake on the destination directory for the final compiled code.
After you run cmake, you will run make to compile the code, then make install to install to the final location.
Certain software applications may require additional libraries or tool packages to function correctly. In such cases, you need to compile these dependency packages or libraries before you compile your main software.
On HPC environments like Hoffman2, many pre-compiled libraries can be conveniently loaded using the module load command.
Consider the following example of QUILL, a computational chemistry SCF code:
QUILL was an early attempt I had at learning Hartree-Fock in Graducate school, which you can check out at https://github.com/charliecpeterson/QUILL
Once your software is compiled, ensure that you load the EXACT SAME modules during runtime as you did during the compilation.
Update your $PATH and $LD_LIBRARY_PATH variables to incorporate your newly compiled software:
$LD_LIBRARY_PATH ensures correct linking of the libraries:
These export commands update $PATH and $LD_LIBRARY_PATH to add the directories of your new software.
Warning
Exercise caution when adding export commands to your $HOME/.bashrc file.
Modifications to this file can inadvertently create conflicts and cause errors during future software installations. As a general rule, it’s best to avoid altering $HOME/.bashrc unless necessary.
Note
Creating dedicated bash scripts to load modules and update environment variables for each software can simplify your workflow and minimize potential conflicts. Simply source the appropriate script when needed.
For instance, consider a start-nwchem-7.0.2.sh file:
module load gcc/10.2.0
module load intel/2022.1.1
export PATH=$HOME/apps/nwchem/7.0.2/bin:$PATH
export NWCHEM_BASIS_LIBRARY=$HOME/apps/nwchem/7.0.2/dataTo use, run:
You can create multiple of these scripts to different software and versions. This will help in case there are conflicts when installing different software
💡 Apptainer, previously known as Singularity, is a container platform tailored for High-Performance Computing (HPC) environments.
Check out my previous workshops on USING and BUILDING container for HPC.
Here’s the general workflow for installing most software on HPC resources:
git, scp, wget to transfer your code to your HPC environment.2 Load the necessary environment: Setup your shell environment with required modules using the module load command.
Configure the build: Use ./configure to initialize your installation process. This script sets up the build environment correctly, checks for dependencies, and creates a Makefile suitable for your system.
Utilize CMake (if supported): Some software packages support cmake which is another powerful build system.
Compile the code: Use the make command to compile your code. This can take anywhere from a few minutes to several hours, depending on the size of your software.
Install the software: Use make install to transfer your compiled software to a specified directory. Remember to indicate this directory during the configuration stage.
Remember to update $PATH and $LD_LIBRARY_PATH when you are ready to use your code
Note
Always refer to the specific installation documentation provided by the software for any additional steps or requirements.